Add `DType::Union` and `UnionVariants` by connortsui20 · Pull Request #7890 · vortex-data/vortex

connortsui20 · 2026-05-11T19:47:04Z

Summary

Tracking issue: #7882

Adds the scaffolding and boilerplate for the new logical DType::Union type.

This is the first PR of probably a bunch, so it simply adds the DType::Union variant and the UnionVariants type that DType::Union holds. Other than that, it is mostly just proto and flatbuffers code and other boilerplate that is easy to implement.

A lot of the code is essentially duplicated from implementations of Struct. It could have been nice if we could reuse StructFields, but we need to additionally store a type IDs field that maps the different variant types to tag indexes (i.e. 0, 1, 2, 5), so we might as well have a new type for this that is named more appropriately.

Also note that the tags do not have to be consecutive (to support potential schema evolution, and also it does make things like casts easier).

I've also enforced that variant names are distinct. The Arrow spec does not make any guarantee about this, but I've chosen the sane path that DuckDB also chooses.

Finally, I reordered some of the dtype match statements so that struct and union are always next to each other after list and fsl. This did cause a bit of extra churn... but with so many types now I think it is important that similar things are next to each other.

Edit: There's 1000+ lines of flatbuffers diff because I think my local flatc install is different from the one on CI?

API Changes

Adds Union variant to DType

Testing

Basic testing (mostly for the serialization roundtrips).

codspeed-hq · 2026-05-11T21:16:57Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 1210 untouched benchmarks

_{Comparing ct/add-union (bd308d7) with develop (3e93048)}

connortsui20 · 2026-05-11T21:22:40Z

+        DType::Union(variants, _) => variants
+            .variants()
+            .all(|variant| supports_uncompressed_size_in_bytes(&variant)),


Happy to remove this until the actual uncompressed_size_in_bytes is implemented later?

connortsui20 · 2026-05-11T21:33:07Z

+            // TODO(connor): This seems wrong...
            DType::FixedSizeList(_, list_size, _) => value.as_list().len() == *list_size as usize,
+            // TODO(connor): This also seems wrong...
            DType::Struct(struct_fields, _) => value.as_list().len() == struct_fields.nfields(),


connortsui20 · 2026-05-11T21:35:28Z

I think my local version of flatc is wrong? Or rather its just different from CI

❯ flatc --version
flatc version 25.12.19

connortsui20 · 2026-05-12T14:36:57Z

#7899

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

joseph-isaacs · 2026-05-12T14:57:06Z

+/// Per Arrow's spec, the per-row type tag is an `int8`. By default, tag `i` selects the child at
+/// offset `i` (`type_ids = [0, 1, ..., N-1]`). Schemas may also use non-consecutive tags (e.g.
+/// `[0, 5, 7]`), in which case the value of `type_ids[i]` is the tag used in the data to select the
+/// child at offset `i`. Supporting non-consecutive tags from v1 lets the schema remove children
+/// without renumbering the remaining tags.


why i8 not u8?

Its worth explaining why we have type_ids and non-consecutive tags too?

Arrow uses i8 and the maximum number of variants is 128 so this is fine imo

joseph-isaacs · 2026-05-12T14:57:52Z

+/// Variant names must be distinct. Unlike [`StructFields`](crate::dtype::StructFields), which
+/// permits duplicate field names for Arrow/Parquet round-trip fidelity, duplicates have no
+/// meaningful semantics in a sum type and are rejected at construction.
+///


so we cannot round trip arrow?

If you use index access not name then it does make sense to have duplicate fields. Fields are just metadata in arrow.

joseph-isaacs · 2026-05-12T15:00:39Z

 /// // Accessing a field by name will yield the first
 /// assert_eq!(fields.field("int_col").unwrap(), DType::Primitive(PType::I32, Nullability::Nullable));
 /// ```
+#[allow(


expect please

expect doesnt work here (for some reason, I have no idea)

connortsui20 · 2026-05-12T15:12:26Z

@joseph-isaacs wants to split this out, so I will remove UnionVariants from this PR

connortsui20 added the changelog/feature A new feature label May 11, 2026

connortsui20 force-pushed the ct/add-union branch from d02a438 to 1fa4a6d Compare May 11, 2026 21:02

connortsui20 mentioned this pull request May 11, 2026

Tracking Issue: Add DType::Union #7882

Open

8 tasks

connortsui20 force-pushed the ct/add-union branch from 1fa4a6d to ad61b12 Compare May 11, 2026 21:14

connortsui20 marked this pull request as ready for review May 11, 2026 21:16

connortsui20 requested review from AdamGS, a10y and robert3005 May 11, 2026 21:16

connortsui20 force-pushed the ct/add-union branch 3 times, most recently from 4e8802c to 4e86fc3 Compare May 11, 2026 21:32

connortsui20 commented May 11, 2026

View reviewed changes

add union type

bd308d7

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 force-pushed the ct/add-union branch from 4e86fc3 to bd308d7 Compare May 12, 2026 14:42

connortsui20 requested a review from joseph-isaacs May 12, 2026 14:43

joseph-isaacs reviewed May 12, 2026

View reviewed changes

connortsui20 closed this May 12, 2026

connortsui20 deleted the ct/add-union branch May 12, 2026 15:17

connortsui20 mentioned this pull request May 12, 2026

Standardize DType match arms to follow variant declaration order #7897

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `DType::Union` and `UnionVariants`#7890

Add `DType::Union` and `UnionVariants`#7890
connortsui20 wants to merge 1 commit into
developfrom
ct/add-union

connortsui20 commented May 11, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented May 11, 2026 •

edited

Loading

Uh oh!

connortsui20 May 11, 2026

Uh oh!

Uh oh!

connortsui20 May 11, 2026

Uh oh!

connortsui20 commented May 11, 2026

Uh oh!

connortsui20 commented May 12, 2026

Uh oh!

joseph-isaacs May 12, 2026 •

edited

Loading

Uh oh!

connortsui20 May 12, 2026

Uh oh!

joseph-isaacs May 12, 2026

Uh oh!

joseph-isaacs May 12, 2026

Uh oh!

connortsui20 May 12, 2026

Uh oh!

connortsui20 commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

connortsui20 commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

API Changes

Testing

Uh oh!

codspeed-hq Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

connortsui20 May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

connortsui20 May 11, 2026

Choose a reason for hiding this comment

Uh oh!

connortsui20 commented May 11, 2026

Uh oh!

connortsui20 commented May 12, 2026

Uh oh!

joseph-isaacs May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

connortsui20 May 12, 2026

Choose a reason for hiding this comment

Uh oh!

joseph-isaacs May 12, 2026

Choose a reason for hiding this comment

Uh oh!

joseph-isaacs May 12, 2026

Choose a reason for hiding this comment

Uh oh!

connortsui20 May 12, 2026

Choose a reason for hiding this comment

Uh oh!

connortsui20 commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

connortsui20 commented May 11, 2026 •

edited

Loading

codspeed-hq Bot commented May 11, 2026 •

edited

Loading

joseph-isaacs May 12, 2026 •

edited

Loading